Tested in R version 3.6.0.
Raw data were quality controlled with FastQC. The reads have high quality for all samples. Adapters were removed with Trimgalore and mapped with STAR[1] to the mm10 genome. STAR outputs counts for all genomic features/tags/genes (option: –quantMode GeneCounts). These are the raw data imported into this R pipeline which uses the edgeR[2] framework for RNAseq analysis (see ‘Differential Expression’ for more details). First, however, genes with low counts and unlikely to be translated (biologically meaningful) and also without enough counts for a reliable statistical judgement are removed. We require genes to have greater than (10/minimum sample library size in millions) counts per million.
The DGE analysis was performed using the edgeR framework. The TMM normalization for library size and composition bias was applied to the count-filtered data (not the VOOM normalized). This was followed by the models indicated in each contrast tab and tested using the quasi-likelihood t-test relative to a threshold (TREAT)[4] using a log2(1.5) threshold for differential expression between groups. P-values were then adjusted using the BH method. Genes were termed significant with FDR < 0.05. These significant genes were used in the Functional Enrichment using the ClusterProfiler R package. Both the Reactome Pathway and Gene Ontologies were tested for enrichment.
More detailed RNAseq explanation for users Most of the following text describing the RNAseq workflow was copied directly from the following publications to give a summary overview of the workflow:
EdgeR uses the negative binomial (NB) distribution to model the read counts for each gene in each sample. The dispersion parameter of the NB distribution accounts for variability between biological replicates. edgeR estimates an empirical Bayes moderated dispersion for each individual gene. It also estimates a common dispersion, which is a global dispersion estimate averaged over all genes, and a trended dispersion where the dispersion of a gene is predicted from its abundance. The estimation is robustified against potential outlier genes. For RNA-seq studies, the NB dispersions tend to be higher for genes with very low counts. The dispersion trend tends to decrease smoothly with abundance and to asymptotic to a constant value for genes with larger counts. From our past experience, the asymptotic value for the BCV tends to be in range from 0.05 to 0.2 for genetically identical mice or cell lines, whereas somewhat larger values (> 0.3) are observed for human subjects.
The NB model can be extended with quasi-likelihood (QL) methods to account for gene-specific variability from both biological and technical sources. Under the QL framework, the NB dispersion trend is used to describe the overall biological variability across all genes, and gene-specific variability above and below the overall level is picked up by the QL dispersion. The raw QL dispersion estimates are squeezed towards a global trend, and this moderation reduces the uncertainty of the estimates and improves testing power. The extent of the squeezing is governed by the value of the prior df estimated from the data. Large prior df estimates indicate that the QL dispersions are less variable between genes, meaning that strong EB moderation should be performed. Smaller prior df estimates indicate that the true unknown dispersions are highly variable, so weaker moderation towards the trend is appropriate. In general, if there are a large number of samples and/or high variability, then the QL dispersions are not squeezed very heavily from the raw values. If there are a low number of samples and/or low variability, then the dispersions will be squeezed more heavily.
The QL F-test is then used to find differentially expressed genes. While the likelihood ratio test is a more obvious choice for inferences with GLMs, the QL F-test is preferred as it reflects the uncertainty in estimating the dispersion for each gene. It provides more robust and reliable error rate control even when the number of replicates is small.
Programs:
Citations:
| Var1 | Freq |
|---|---|
| LKB1-/- | 3 |
| LKB1+/+ | 3 |
| sample | group | genotype | condition |
|---|---|---|---|
| RJ_RNA_1 | LKB1_KO | LKB1-/- | LKB1-/- |
| RJ_RNA_11 | LKB1_KO | LKB1-/- | LKB1-/- |
| RJ_RNA_21 | LKB1_KO | LKB1-/- | LKB1-/- |
| RJ_RNA_6 | LKB1_WT | LKB1+/+ | LKB1+/+ |
| RJ_RNA_16 | LKB1_WT | LKB1+/+ | LKB1+/+ |
| RJ_RNA_26 | LKB1_WT | LKB1+/+ | LKB1+/+ |
## The following samples have no meta data:
##
## The following samples have no count data:
## Resolving...
## Raw Feature Counts Summary (Millions):
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 12.44 17.15 24.71 25.06 33.90 36.90
| millions_of_counts | sample | |
|---|---|---|
| RJ_RNA_1 | 12.44183 | RJ_RNA_1 |
| RJ_RNA_11 | 18.75102 | RJ_RNA_11 |
| RJ_RNA_16 | 30.67760 | RJ_RNA_16 |
| RJ_RNA_21 | 34.97255 | RJ_RNA_21 |
| RJ_RNA_26 | 36.89899 | RJ_RNA_26 |
| RJ_RNA_6 | 16.61617 | RJ_RNA_6 |
## Total tags:
## 50600
## Total tags with >0 counts:
## 24039
## Derived the cpm filter from a minimum of 10 counts in the smallest library
## Filters:
## minimum 0.8037402 cpm in 3 sample
## Filtered tags:
## 12302
We use a VOOM normalization to visualize the data. This transforms on a log2 scale.
Publication describing VOOM: Law, C. W., Chen, Y., Shi, W., & Smyth, G. K. (2014). Voom: Precision weights unlock linear model analysis tools for RNA-seq read counts. Genome Biology, 15(2), 1–17. https://doi.org/10.1186/gb-2014-15-2-r29
FDR Filter: 0.05 .
logFC Filter 0.58 .
| groupLKB1_KO | groupLKB1_WT | |
|---|---|---|
| RJ_RNA_1 | 1 | 0 |
| RJ_RNA_6 | 0 | 1 |
| RJ_RNA_11 | 1 | 0 |
| RJ_RNA_16 | 0 | 1 |
| RJ_RNA_21 | 1 | 0 |
| RJ_RNA_26 | 0 | 1 |
| KO_vs_WT = groupLKB1_KO - groupLKB1_WT | |
|---|---|
| groupLKB1_KO | 1 |
| groupLKB1_WT | -1 |
| group | lib.size | norm.factors | |
|---|---|---|---|
| RJ_RNA_1 | 1 | 12424902 | 1.0014461 |
| RJ_RNA_6 | 1 | 16599141 | 0.9836468 |
| RJ_RNA_11 | 1 | 18728147 | 1.0032096 |
| RJ_RNA_16 | 1 | 30643044 | 0.9778989 |
| RJ_RNA_21 | 1 | 34919959 | 1.0551082 |
| RJ_RNA_26 | 1 | 36860477 | 0.9807325 |
In the MDS plot, the distance between each pair of samples can be interpreted as the leading log-fold change between the samples for the genes that best distinguish that pair of samples. By default, leading fold-change is defined as the root-mean-square of the largest 500 log2-fold changes between that pair of samples.
Common dispersion: 0.01953263 .
Biological Coefficient of Variation: 0.1397592 .
Typical values for the common BCV (square-root-dispersion) for datasets arising from well-controlled experiments are 0.4 for human data, 0.1 for data on genetically identical model organisms or 0.01 for technical replicates.
Total tags: 12302 .
Tags with FDR<0.05: 2689 .
Differentially expressed tags (FDR<0.05 & logFC>0.58): 2689 .
| logFC | unshrunk.logFC | logCPM | PValue | FDR | ens_gene | ext_gene | predicted_function | chromosome_name | start_position | end_position | transcript_length | entrezgene_id |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| -9.962930 | -9.980276 | 7.709535 | 0 | 1e-07 | ENSMUSG00000023092 | Fhl1 | four and a half LIM domains 1 [Source:MGI Symbol;Acc:MGI:1298387] | X | 56731787 | 56793346 | 2356 | 14199 |
| 6.569521 | 6.572131 | 7.026823 | 0 | 2e-07 | ENSMUSG00000047250 | Ptgs1 | prostaglandin-endoperoxide synthase 1 [Source:MGI Symbol;Acc:MGI:97797] | 2 | 36230426 | 36252272 | 2867 | 19224 |
| 10.067015 | 10.131348 | 5.933270 | 0 | 2e-07 | ENSMUSG00000031740 | Mmp2 | matrix metallopeptidase 2 [Source:MGI Symbol;Acc:MGI:97009] | 8 | 92827291 | 92853420 | 3078 | 17390 |
| 5.299084 | 5.301919 | 5.632772 | 0 | 2e-07 | ENSMUSG00000061353 | Cxcl12 | chemokine (C-X-C motif) ligand 12 [Source:MGI Symbol;Acc:MGI:103556] | 6 | 117168535 | 117181367 | 1878 | 20315 |
| -7.525936 | -7.546090 | 5.055786 | 0 | 2e-07 | ENSMUSG00000046572 | Zfp518b | zinc finger protein 518B [Source:MGI Symbol;Acc:MGI:2140750] | 5 | 38668484 | 38684826 | 6804 | 100515 |
| 5.932290 | 5.935486 | 6.089325 | 0 | 2e-07 | ENSMUSG00000033436 | Armcx2 | armadillo repeat containing, X-linked 2 [Source:MGI Symbol;Acc:MGI:1914666] | X | 134804145 | 134809221 | 3007 | 67416 |
| -4.376064 | -4.376877 | 6.526717 | 0 | 2e-07 | ENSMUSG00000030220 | Arhgdib | Rho, GDP dissociation inhibitor (GDI) beta [Source:MGI Symbol;Acc:MGI:101940] | 6 | 136923655 | 136941899 | 1153 | 11857 |
| -11.776919 | -11.894306 | 6.817376 | 0 | 2e-07 | ENSMUSG00000001819 | Hoxd13 | homeobox D13 [Source:MGI Symbol;Acc:MGI:96205] | 2 | 74668310 | 74671599 | 2483 | 15433 |
| 6.386351 | 6.387273 | 8.347684 | 0 | 2e-07 | ENSMUSG00000026249 | Serpine2 | serine (or cysteine) peptidase inhibitor, clade E, member 2 [Source:MGI Symbol;Acc:MGI:101780] | 1 | 79794197 | 79861180 | 2210 | 20720 |
| -5.605983 | -5.608306 | 6.241398 | 0 | 2e-07 | ENSMUSG00000000184 | Ccnd2 | cyclin D2 [Source:MGI Symbol;Acc:MGI:88314] | 6 | 127125162 | 127152193 | 6330 | 12444 |
| 5.570574 | 5.576739 | 4.779429 | 0 | 2e-07 | ENSMUSG00000027339 | Rassf2 | Ras association (RalGDS/AF-6) domain family member 2 [Source:MGI Symbol;Acc:MGI:2442060] | 2 | 131989415 | 132030258 | 8233 | 215653 |
| -4.853919 | -4.855760 | 5.822455 | 0 | 2e-07 | ENSMUSG00000029648 | Flt1 | FMS-like tyrosine kinase 1 [Source:MGI Symbol;Acc:MGI:95558] | 5 | 147561604 | 147726011 | 6895 | 14254 |
| 6.707474 | 6.720459 | 4.853428 | 0 | 2e-07 | ENSMUSG00000044337 | Ackr3 | atypical chemokine receptor 3 [Source:MGI Symbol;Acc:MGI:109562] | 1 | 90203980 | 90216751 | 3065 | 12778 |
| 9.480325 | 9.594091 | 4.537504 | 0 | 2e-07 | ENSMUSG00000033361 | Prrg3 | proline rich Gla (G-carboxyglutamic acid) 3 (transmembrane) [Source:MGI Symbol;Acc:MGI:2685214] | X | 71962624 | 71972722 | 344 | 208748 |
| 4.293728 | 4.295360 | 5.422149 | 0 | 2e-07 | ENSMUSG00000028517 | Plpp3 | phospholipid phosphatase 3 [Source:MGI Symbol;Acc:MGI:1915166] | 4 | 105157347 | 105232764 | 3159 | 67916 |
| 8.871311 | 8.873712 | 9.440891 | 0 | 2e-07 | ENSMUSG00000021614 | Vcan | versican [Source:MGI Symbol;Acc:MGI:102889] | 13 | 89655312 | 89742509 | 8332 | 13003 |
| 6.264677 | 6.266776 | 7.043514 | 0 | 2e-07 | ENSMUSG00000074743 | Thbd | thrombomodulin [Source:MGI Symbol;Acc:MGI:98736] | 2 | 148404466 | 148408188 | 3723 | 21824 |
| 7.429860 | 7.439473 | 6.017043 | 0 | 3e-07 | ENSMUSG00000018417 | Myo1b | myosin IB [Source:MGI Symbol;Acc:MGI:107752] | 1 | 51749765 | 51916071 | 5052 | 17912 |
| 4.218141 | 4.218384 | 8.102869 | 0 | 3e-07 | ENSMUSG00000031375 | Bgn | biglycan [Source:MGI Symbol;Acc:MGI:88158] | X | 73483602 | 73495933 | 828 | 12111 |
| 6.755702 | 6.758677 | 7.016033 | 0 | 3e-07 | ENSMUSG00000036334 | Igsf10 | immunoglobulin superfamily, member 10 [Source:MGI Symbol;Acc:MGI:1923481] | 3 | 59316735 | 59344394 | 8484 | 242050 |
| 5.551056 | 5.559305 | 4.346965 | 0 | 3e-07 | ENSMUSG00000041444 | Arhgap32 | Rho GTPase activating protein 32 [Source:MGI Symbol;Acc:MGI:2450166] | 9 | 32116136 | 32268446 | 13568 | 330914 |
| 10.469977 | 10.552187 | 5.993284 | 0 | 3e-07 | ENSMUSG00000030671 | Pde3b | phosphodiesterase 3B, cGMP-inhibited [Source:MGI Symbol;Acc:MGI:1333863] | 7 | 114415281 | 114539251 | 6573 | 18576 |
| 5.300661 | 5.301928 | 6.796958 | 0 | 4e-07 | ENSMUSG00000022425 | Enpp2 | ectonucleotide pyrophosphatase/phosphodiesterase 2 [Source:MGI Symbol;Acc:MGI:1321390] | 15 | 54838901 | 54952892 | 3095 | 18606 |
| -5.148191 | -5.152652 | 4.865875 | 0 | 4e-07 | ENSMUSG00000024512 | Dynap | dynactin associated protein [Source:MGI Symbol;Acc:MGI:1922827] | 18 | 70240429 | 70244582 | 1225 | 75577 |
| -3.760112 | -3.760807 | 6.131435 | 0 | 4e-07 | ENSMUSG00000001123 | Lgals9 | lectin, galactose binding, soluble 9 [Source:MGI Symbol;Acc:MGI:109496] | 11 | 78962974 | 78984946 | 1520 | 16859 |
GSEA testing was performed with ClusterProfiler’s enrichGO function.
GSEA testing was performed with ClusterProfiler’s GSEA function.
[1] “Significantly differentially expressed genes are k-means clustered (k=6).” [1] “GO category enrichment is performed on each of these clusters.”